Chinese text segmentation for text retrieval: Achievements and problems

Zimin Wu; Gwyneth Tseng

doi:10.1002/(sici)1097-4571(199310)44:9<532::aid-asi3>3.0.co;2-m

Chinese text segmentation for text retrieval: Achievements and problems

Journal of the American Society for Information Science ◽

10.1002/(sici)1097-4571(199310)44:9<532::aid-asi3>3.0.co;2-m ◽

1993 ◽

Vol 44 (9) ◽

pp. 532-542 ◽

Cited By ~ 41

Author(s):

Zimin Wu ◽

Gwyneth Tseng

Keyword(s):

Chinese Text ◽

Text Retrieval ◽

Text Segmentation

Download Full-text

ACTS: An automatic Chinese text segmentation system for full text retrieval

Journal of the American Society for Information Science ◽

10.1002/(sici)1097-4571(199503)46:2<83::aid-asi2>3.0.co;2-0 ◽

1995 ◽

Vol 46 (2) ◽

pp. 83-96 ◽

Cited By ~ 22

Author(s):

Zimin Wu ◽

Gwyneth Tseng

Keyword(s):

Chinese Text ◽

Full Text ◽

Text Retrieval ◽

Text Segmentation ◽

Full Text Retrieval

Download Full-text

Applying the Bell’s Test to Chinese Texts

Entropy ◽

10.3390/e22030275 ◽

2020 ◽

Vol 22 (3) ◽

pp. 275

Author(s):

Igor A. Bessmertny ◽

Xiaoxi Huang ◽

Aleksei V. Platonov ◽

Chuqiao Yu ◽

Julia A. Koroleva

Keyword(s):

Quantum Entanglement ◽

Chinese Text ◽

Search Engines ◽

Text Processing ◽

Word Segmentation ◽

Significant Problem ◽

Text Segmentation ◽

Text Documents ◽

Segmentation Algorithms ◽

Chinese Texts

Search engines are able to find documents containing patterns from a query. This approach can be used for alphabetic languages such as English. However, Chinese is highly dependent on context. The significant problem of Chinese text processing is the missing blanks between words, so it is necessary to segment the text to words before any other action. Algorithms for Chinese text segmentation should consider context; that is, the word segmentation process depends on other ideograms. As the existing segmentation algorithms are imperfect, we have considered an approach to build the context from all possible n-grams surrounding the query words. This paper proposes a quantum-inspired approach to rank Chinese text documents by their relevancy to the query. Particularly, this approach uses Bell’s test, which measures the quantum entanglement of two words within the context. The contexts of words are built using the hyperspace analogue to language (HAL) algorithm. Experiments fulfilled in three domains demonstrated that the proposed approach provides acceptable results.

Download Full-text